Parallel fast fourier transformation method of concealed-communication type

ABSTRACT

In a 3-dimensional fast Fourier transformation implemented by using a parallel-processing computer, an overhead caused by transfers of data between processors employed in the computer is reduced for the purpose of increasing the efficiency of processing parallelism. In order to reduce the overhead, data is divided into data elements each having an even X coordinate and data elements each having an odd X coordinate. In processing 34, the date elements each having an even coordinate are subjected to the transformation in the Y direction while the date elements each having an odd X coordinate are being subjected to a process of permutation among the processors at the same time. In processing 35, on the other hand, the data elements each having an odd X coordinate are subjected to the transformation in the X direction while the data elements each having even coordinate are being subjected to the process of permutation among the processors at the same time. As a result, the communication time it takes to carry out the process of permutation among the processors can be concealed behind the processing time so that the efficiency of the processing parallelism can be increased over the processing efficiency of the conventional method.

FIELD OF THE INVENTION

[0001] The present invention relates to a method to carry out a fastFourier transformation by using a parallel-processing computer having adistributed-memory configuration.

BACKGROUND OF THE INVENTION

[0002] Large-scale simulations each handling up to several millions ofvariables are required in scientific and technical computations such ascalculations to find characteristics of a semiconductor device,calculations to determine states of electrons and calculations toforecast the weather. As a means for dealing with such large-scaleproblems, a parallel-processing computer, specially, aparallel-processing computer having the so-called distributed-memoryconfiguration is powerful. The parallel-processing computer having adistributed-memory configuration is a system comprising a plurality ofprocessors connected to each other by a network as processors eachhaving its own memory. In comparison with the conventionalsequential-processing computer, the parallel-processing computerparallel-processing computer having a distributed-memory configurationoffers an advantage of allowing the peak performance thereof to beraised to as high a level as desired by increasing the number ofprocessors employed therein.

[0003] In the parallel-processing computer having a distributed-memoryconfiguration, pieces of data serving as an object of calculation arestored in memories distributed among the processors so that theprocessors are capable of carrying out computations on the pieces ofdata in parallel processing. If a specific one of the processor requiresdata owned by another processor in the course of processing, thespecific processor must wait for the required data to be transferredfrom the other processor before continuing the processing. Thus, ingeneral, the parallel-processing computer having a distributed-memoryconfiguration incurs an overhead of time required for transferring datafrom one processor to another in addition to the processing time. Forthis reason, in order to increase the efficiency of computation, it isnecessary to adopt a computation method exhibiting such a high degree ofprocessing parallelism that computation can be done by incurring only ashortest possible period of time required for communication betweenprocessors. In addition, a large number of parallel-processing computershaving a distributed-memory configuration includes a mechanism, which isused for transferring data from a specific one of the processors toanother processor while the specific processor is processing other data.In this configuration, if it is possible to contrive a computationmethod capable of carrying out processing of data and transfers of otherdata at the same time, the time it takes to transfer other data can beconcealed behind the processing time so that the efficiency ofcomputation can be raised.

[0004] The Fourier transformation is one of processes carried outfrequently in a scientific calculation. The Fourier transformation is aprocess of expressing a function f(x) having complex values defined inan interval of real numbers as a superposition of a complex exponentialfunction exp(ikx). In an implementation on a computer, only a finitenumber of handled points can be handled so that the Fouriertransformation becomes a process of expressing a series of points f₀,f₁, . . . , f_(N-1) each representing a complex number as asuperposition of N complex exponential functions exp (2πikj/N) wheresymbol k represents every integer in the range 0, 1, . . . , (N−1),symbol i denotes the imaginary-number unit and symbol π denotes theratio of the circumference of a circle to the diameter thereof asfollows:

exp(2πikj/N)

f _(j)=Σ_(k=0) ^(N−1) c _(k) exp(2πikj/N)

[0005] where symbol j represents every integer in the range 0, 1, . . ., (N−1). That is to say, for the given f₀, f₁, . . . , f_(N-1), theFourier transformation is a process of finding superpositioncoefficients c₀, c₁, . . . , c_(N-1). As commonly known, thesuperposition coefficients c₀, c₁, . . . , c_(N-1) can be found from thefollowing equation:

c _(k=)(1/N) Σ_(j=0) ^(N-1) f _(j) exp(−2πikj/N)

[0006] where symbol k represents every integer in the range 0, 1, . . ., (N−1). If the calculation is carried out on the basis of the abovedefinitions, however, N equations each comprising N terms must besolved. Thus, in addition to calculation of the complex exponentialfunctions exp (−2πikj/N), additions and multiplications of complexnumbers must be carried out N² times. In order to solve this problem ofmuch calculation, in actuality, a technique known as a fast Fouriertransformation is adopted widely. The fast Fourier transformation is atechnique for reducing the amount of computation to an order of NlogN bydevising an algorithm for the Fourier transformation. The fast Fouriertransformation is described in detail in documents such as a referenceauthored by G. Golub and C. F. van Loan with a title of “MatrixComputations”, 3^(rd) edition, published by The John Hopkins UniversityPress, 1996, pp. 189-192.

[0007] The Fourier transformation described above is called a1-dimensional Fourier transformation. However, a 3-dimensional Fouriertransformation is applied to computations such as the calculations tofind characteristics of a semiconductor device, the calculations todetermine states of electrons and the calculations to forecast theweather. The 3-dimensional Fourier transformation is a process toexpress complex-number data {f_(jx, jy, jz)} having 3 subscripts j_(x),j_(y) and j_(z) where symbol j_(x) represents every integer in the range0, 1, . . . , (N_(x)−1), symbol j_(y) represents every integer in therange 0, 1, . . . , (N_(y)−1) and symbol j_(z) represents every integerin the range 0, 1, . . . , (N_(z)−1) as a superposition ofN_(x)×N_(y)×N_(z) complex exponential functionsexp(−2πik_(x)j_(x)/N_(x)) exp(−2πik_(y)j_(y)/N_(y))exp(−2πik_(z)j_(z)/N_(z)) where symbol k_(x) represents every integer inthe range 0, 1, . . . , (N_(x)−1), symbol k_(y) represents every integerin the range 0, 1, . . . , (N_(y)−1) and symbol k_(z) represents everyinteger in the range 0, 1, . . . , (N_(z)−1) as follows:

f_(jx, jy, jz)=Σ_(kx=0) ^(Nx-1)Σ_(ky=0) ^(Ny-1)Σ_(kz=0) ^(Nz-1)

c_(kx, ky, kz) exp(−2πik_(x)j_(x)/N_(x)) exp(−2πik_(y)j_(y)/N_(y))exp(−2πik_(z)j_(z)/N_(z))

[0008] where symbol j_(x) represents every integer in the range 0, 1, .. . , (N_(x)−1), symbol j_(y) represents every integer in the range 0,1, . . . , (N_(y)−1) and symbol j_(z) represents every integer in therange 0, 1, . . . , (N_(z)−1). That is to say, for the given{f_(jx, jy, jz,)}, the 3-dimensional Fourier transformation is a processof finding a superposition coefficient {C_(kx, ky, kz)}. As commonlyknown, the superposition coefficient {C_(kx, ky, kz)} can be found fromthe following equation:

c_(kx, ky, kz)=Σ_(jx=0) ^(Nx-1)Σ_(jy=0) ^(Ny-1)Σ_(jz=0) ^(Nz-1)

f_(jx, jy, jz) exp(−2πik_(x)j_(x)/N_(x)) exp(−2πik_(y)j_(y)/N_(y))exp(−2πik_(z)j_(z)/N_(z))

[0009] where symbol k_(x) represents every integer in the range 0, 1, .. . , (N_(x)−1), symbol k_(y) represents every integer in the range 0,1, . . . , (N_(y)−1) and symbol k_(z) represents every integer in therange 0, 1, . . . , (N_(z)−1).

[0010] Furthermore, it is easy to show that the above equation can besolved by sequentially carrying out the following three transformations:

[0011] <Transformation in the Y Direction>

c_(jx, ky, jz) ⁽¹⁾=Σ_(jy=0) ^(Ny-1) f_(jx, jy, jz)exp(−2πik_(y)j_(y)/N_(y))

[0012] where symbol j_(x) represents every integer in the range 0, 1, .. . , (N_(x)−1), symbol k_(y) represents every integer in the range 0,1, . . . , (N_(y)−1) and symbol j_(z) represents every integer in therange 0, 1, . . . , (N_(z)−1).

[0013] <Transformation in the X Direction>

c_(kx, ky, jz) ⁽²⁾=Σ_(jx=0) ^(Nx-1)c_(jx, ky, jz) ⁽¹⁾exp(−2πik_(x)j_(x)/N_(x))

[0014] where symbol k_(x) represents every integer in the range 0, 1, .. . , (N_(x)−1) symbol k_(y) represents every integer in the range 0, 1,. . . , (N_(y)−1) and symbol j_(z) represents every integer in the range0, 1, . . . , (N_(z)−1).

[0015] <Transformation in the Z Direction>

C _(kx, ky, kz)=Σ_(jz=0) ^(Nz-1) c _(kx, ky, jz) ⁽²⁾ exp(−2πik _(z) j_(z) /N _(z))

[0016] where symbol k_(x) represents every integer in the range 0, 1, .. . , (N_(x)−1), symbol k_(y) represents every integer in the range 0,1, . . . , (N_(y)−1) and symbol k_(z) represents every integer in therange 0, 1, . . . , (N_(z)−1).

[0017] As is obvious from the above equations, the transformation in theY direction is a 1-dimensional Fourier transformation carried out onN_(y) pieces of data having the same subscripts j_(x) and j_(z). Then,subscripts j_(x) and j_(z) are varied, being used in carrying out such atransformation N_(x)×N_(z) times in order to complete the transformationin the Y direction. The transformations in the X and Z directions arecarried out in the same way as the transformation in the Y direction.Thus, as indicated by reference numeral 1 shown in FIG. 2, if pieces of3-dimensional data {f_(jx, jy, jz)} are arranged to form a rectangularsolid with dimensions of N_(x)×N_(y)×N_(z) where symbols N_(x), N_(y)and N_(z) denote the lengths of its sides, the transformation in the Ydirection is a 1-dimensional Fourier transformation carried out on N_(y)pieces of data 2, which are parallel to the Y axis. By the same token,the transformation in the X direction is a 1-dimensional Fouriertransformation carried out on N_(x) pieces of data 3, which are parallelto the X axis. Likewise, the transformation in the Z direction is a1-dimensional Fourier transformation carried out on N_(z) pieces of data4, which are parallel to the Z axis. It is obvious that, by adoption ofthis method of computation, in the transformation in the Y direction,calculations for sets of data with different X coordinates or differentZ coordinates can be carried out concurrently. By the same token, it isalso obvious that, in the transformation in the X direction,calculations for sets of data with different Y coordinates or differentZ coordinates can be carried out concurrently. Similarly, it is obviousas well that, in the transformation in the Z direction, calculations forsets of data with different X coordinates or different Y coordinates canbe carried out concurrently.

[0018] Traditionally, a method utilizing the parallelism described aboveis generally adopted in execution of the 3-dimensional fast Fouriertransformation using a parallel-processing computer having adistributed-memory configuration. An example of such a method isreferred to as a permutation algorithm, which is an efficient techniqueof reducing the amount of data transferred between processors to aminimum. This efficient technique is described in detail in documentssuch as a reference authored by V. Kumar, A. Grama, A. Gupta and G.Karypis with a title of “Introduction to Parallel Computing”, publishedby The Benjamin/Cummings Publishing Company, 1994, pp. 377-406. Inaccordance with this method, as shown in FIG. 3, first of all,3-dimensional data is split into as many pieces of data 5 each arrangedon a plane perpendicular to the Z axis as processors, and the pieces ofdata 5 are each stored in a memory provided for one of the processors ina distributed-memory configuration. Then, in this state, atransformation in the Y direction is carried out. It is obvious that,since only 1 processor has all N_(y) pieces of data 2 required in thetransformation in the Y direction in itself in this state, thetransformation in the Y direction can be carried out without the need totransfer data between processors. After the transformation in the Ydirection is completed, the technique of splitting data is changed. Thistime, the 3-dimensional data is split into as many pieces of data 6 eacharranged on a plane perpendicular to the Y axis as processors, and thepieces of data 6 are each stored in a memory provided for one of theprocessors in a distributed-memory configuration. In consequence, everyprocessor needs to carry out a process to transfer data to all otherprocessors. This process is referred to as permutation. After thepermutation process is completed, however, each processor has all N_(x)pieces of data 3 required in the transformation in the X direction initself. Thus, the transformation in the X direction can be carried outwithout the need to transfer data between processors. In addition, alsoin the case of the transformation in the Z direction, each processor hasall N_(z) pieces of data 4 required in the transformation in the Zdirection in itself. Thus, the transformation in the Z direction can becarried out without the need to transfer data between processors. Inthis way, the 3-dimensional Fourier transformation can be completed. Theabove description explains the use of a parallel-processing computerhaving a distributed-memory configuration to implement a method ofcarrying out the 3-dimensional fast Fourier transformation.

[0019] In accordance with the parallel computing method based on thepermutation algorithm described above, the transformations in the Y, Xand Z directions can be carried out in processors in a completelyindependent way. In the permutation process carried out in the course ofcomputing, however, every processor needs to transfer data to all otherprocessors. In general, in a parallel-processing computer having adistributed-memory configuration, it takes much time to transfer data incomparison with the processing time itself. This phenomenon has beenbecoming obvious more and more as the processing speed of thecontemporary processor is increased. In addition, in recent years, PCclusters are widely used. A PC cluster is a number of personal computers(PCs) connected to each other by using a network such as the Internet (aregistered trademark). In the case of a PC cluster, the power totransfer data among the personal computers is low in comparison with aparallel-processing computer having a distributed-memory configuration.Thus, in particular, the power to transfer data among the personalcomputers most likely becomes a bottleneck of the processing time. As isobvious from the background described above, in many cases, theconventional method based on the permutation algorithm does not assuresufficient parallel-processing performance in the use of aparallel-processing computer having a distributed-memory configurationfor execution of the 3-dimensional fast Fourier transformation. It isthus an object of the present invention to solve this problem.

SUMMARY OF THE INVENTION

[0020] In order to achieve the object cited above, in accordance withthe present invention, the Y-direction and X-direction transformationsbased on the permutation algorithm are each split into two partialprocesses and the processing order is changed so as to allowcommunications and calculative processing to overlap each other. Forthis reason, the present invention focuses on the followingtransformation properties:

[0021] (1) The transformation in the Y direction exhibits parallelismwith respect to the transformation in the X direction. Thus, eachprocessor is capable of carrying out a transformation on N_(x) sets ofdata as a transformation to be executed by itself in any arbitraryorder.

[0022] (2) The transformation in the X direction comprises log₂(N_(x))steps. At the first log₂(N_(x)−1) steps of the steps composing thetransformation in the X direction, data elements each having an even Xcoordinate are processed only in conjunction with data elements eachalso having an even X coordinate whereas data elements each having anodd X coordinate are processed only in conjunction with data elementseach also having an odd X coordinate.

[0023] The case of N_(x)=8 is taken as an example. In this case, theflow of a computation carried out on a data set in the transformation inthe X direction is shown in FIG. 4. In the flow diagram, eight circles 7arranged vertically each denote 1 data element and a line 8 connectingcircles 7 indicates that a computation of a value of the circle 7 on theright end of the line 8 requires the value of the circle 7 on the leftend of the line 8. As is obvious from the figure, at the firstlog₂(N_(x)−1)=2 steps of the transformation, data elements each havingan even X coordinate are processed only in conjunction with dataelements each also having an even X coordinate whereas data elementseach having an odd X coordinate are processed only in conjunction withdata elements each having also an odd X coordinate. Accordingly, it isclear that transformation property (2) described above holds true. Thistransformation property applies not only to the special case of N_(x)=8,but also to general cases.

[0024] If the two properties, i. e., properties (1) and (2) describedabove, are utilized, the Y-direction and X-direction transformationsbased on the permutation algorithm can be divided into the following 5pieces of processing:

[0025] <Processing 1>

[0026] A transformation in the Y direction is carried out only on dataelements each having an even X coordinate.

[0027] <Processing 2>

[0028] A transformation in the Y direction is carried out only on dataelements each having an odd X coordinate. At the same time, apermutation process is carried out only on data elements each having aneven X coordinate.

[0029] <Processing 3>

[0030] The first log₂(N_(x)−1) steps of the transformation in the Xdirection are executed only on data elements each having an even Xcoordinate. At the same time, a permutation process is carried out onlyon data elements each having an odd X coordinate.

[0031] <Processing 4>

[0032] The first log₂(N_(x)−1) steps of the transformation in the Xdirection are executed only on data elements each having an odd Xcoordinate.

[0033] <Processing 5>

[0034] The last step of the transformation in the X direction isexecuted.

[0035] The states of processing 1 to processing 4 according to thepresent invention are shown in FIG. 5. In the figure, hatched portions 9and 10 represent data elements being subjected to transformationcalculations, a gray portion 11 represents data elements being subjectedto data transfers and a white portion 12 represents data elements notbeing subjected to any operations. A line in the rectangular solidrepresents division of data into data portions each assigned to aprocessor. In accordance with the computation method provided by thepresent invention, in processing 2, while a transformation in the Ydirection is being carried out on data elements each having an odd Xcoordinate, a permutation process can be carried out only on dataelements each having an even X coordinate. In processing 3, on the otherhand, while a transformation in the Y direction is being carried out ondata elements each having an even X coordinate, a permutation processcan be carried out only on data elements each having an odd Xcoordinate. In this way, the time it takes to transfer data can beconcealed behind the processing time and, hence, the efficiency ofcomputation can be increased.

BRIEF DESCRIPTION OF THE DRAWINGS

[0036]FIG. 1 shows a flowchart representing a computation technique ofY-direction and X-direction transformations adopting a parallel3-dimensional fast Fourier transformation method according to thepresent invention;

[0037]FIG. 2 is a diagram showing the conventional 3-dimensional fastFourier transformation method;

[0038]FIG. 3 is a diagram showing the conventional parallel3-dimensional fast Fourier transformation method;

[0039]FIG. 4 is a diagram showing dependence relations of data in thetransformation carried out in the X direction;

[0040]FIG. 5 is a diagram showing a computation technique of Y-directionand X-direction transformations adopting a parallel 3-dimensional fastFourier transformation method according to the present invention;

[0041]FIG. 6 is a diagram showing the configuration of aparallel-processing computer having a distributed-memory configurationas a computer to which the present invention is to be applied;

[0042]FIG. 7 shows a flowchart representing the conventional parallel3-dimensional fast Fourier transformation method;

[0043]FIG. 8 shows a flowchart representing operations of a parallel3-dimensional fast Fourier transformation library according to thepresent invention; and

[0044]FIG. 9 shows a flowchart representing a weather forecastcomputation using the parallel-processing computer having adistributed-memory configuration according to the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0045] <<First Embodiment>>

[0046] (1) Schematic Configuration of the Computer

[0047] An embodiment and principle of the present invention will bedescribed in detail by referring to diagrams. The embodiment implementsa parallel-processing computer having a distributed-memory configurationas a computer for solving partial-differential equations by using a3-dimensional Fourier transformation. To be more specific, theparallel-processing computer having a distributed-memory configurationis used as a means for running a simulation. In this particular case, asa typical simulation, calculations for weather forecasting areexplained.

[0048]FIG. 6 is a diagram showing a parallel-processing computer systemfor executing a parallel-processing program implementing a computationmethod provided by the present invention. As shown in the figure, theparallel-processing computer system comprises an input apparatus 13, aprocessing apparatus 17, an output apparatus 18 and an external storageapparatus 21. The input apparatus 13 is an apparatus for inputtingparameters such as the shape of computation area, initial conditions andmaterial constants. The processing apparatus 17 comprises P processors15 and a network 16. The processors 15 each have a memory 14. Thenetwork 16 is used for transferring data between the memories 14employed in the processors 15. The output apparatus 18 is an apparatusfor outputting results of computation. The external storage apparatus 21is an apparatus for storing a program 19 and data 20. Data can betransferred between the memories 14 employed in the processors 15 at thesame time as processing carried out by the processors 15.

[0049] (2) 3-Dimensional Parallel-Processing Fast Fourier TransformationBased on the Conventional Method

[0050] By using mathematical expressions, the following descriptionexplains the principle of the 3-dimensional fast Fourier transformationexecuted on a parallel-processing computer having a distributed-memoryconfiguration. The 3-dimensional Fourier transformation is a process tofind N_(x)×N_(y)×N_(z) pieces of complex output data {C_(kx, ky, kz)}from N_(x)×N_(y)×N_(z) pieces of complex input data {f_(jx, jy, jz)} byusing Eq. (1) given as follows:

c_(kx, ky, kz)=Σ_(jx=0) ^(Nx-1)Σ_(jy=0) ^(Ny-1)Σ_(jz=0) ^(Nz-1)

f_(jx, jy, jz) exp(−2πik_(x)j_(x)/N_(x)) exp(−2πik_(y)j_(y)/N_(y))exp(−2πik_(z)j_(z)/N_(z))   (1)

[0051] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol k_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1).

[0052] It is readily obvious that Eq. (1) can be solved by sequentiallycarrying out transformations in the Y, X and Z directions, which areexpressed by Eqs. (2), (3) and (4) respectively as follows:

c _(jx, ky, jz) ⁽¹⁾=Σ_(jy=0) ^(Ny-1) f _(jx, jy, jz) exp(−2πik _(y) j_(y) /N _(y))   (2)

[0053] where symbol j_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1).

c _(kx, ky, jz) ⁽²⁾=Σ_(jx=0) ^(Nx-1) c _(jx, ky, jz) ⁽¹) exp(−2πik _(x)j _(x) /N _(x))   (3)

[0054] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1).

c _(kx, ky, kz)=Σ_(jz=0) ^(Nz-1) C _(kx, ky, jz) ⁽²⁾ exp(−2πik _(z) j_(z) /N _(z))   (4)

[0055] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol k_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1).

[0056] These transformations are carried out in the parallel-processingcomputer having a distributed-memory configuration by performingN_(x)×N_(z) independent transformations expressed by Eq. (2),N_(y)×N_(z) independent transformations expressed by Eq. (3) andN_(x)×N_(y) independent transformations expressed by Eq. (4). Acomputation method based on this concept is the permutation algorithmdescribed in a paragraph with a title of ‘Background of the Invention.’FIG. 7 shows a flowchart representing a computation based on thepermutation algorithm. The computation mainly comprises the followingfour processes:

[0057] <Transformation in the Y Direction>

[0058] In processing 23 of the flowchart shown in FIG. 7, 3-dimensionaldata f_(jx, jy, jz) is received from the input apparatus. In the nextprocessing 24, the data is arranged to form a rectangular solid. In thenext processing 25, the data is divided into as many data portions eachlocated on a plane perpendicular to the Z axis as processors and thedata portions are each stored in a memory provided for one of theprocessors in a distributed-memory configuration.

[0059] As a technique to divide data, it is possible to adopt one of avariety of methods such as a block division method and a cyclic divisionmethod. For details of the data division method, refer to a referenceauthored by V. Kumar, A. Grama, A. Gupta and G. Karypis with a title of“Introduction to Parallel Computing,” published by The Benjamin/CummingsPublishing Company in 1994. In accordance with the block divisionmethod, for example, processor p where 0≦p≦(P−1) is designated as aprocessor having (N_(z)×N_(y)×N_(z))/P pieces of data with a Zcoordinate j_(z) satisfying the following relation:

(N_(z)/P)×p≦j_(z)≦(N_(z)/P)×(p+1)−1

[0060] With the data divided into data portions as described above, inthe next processing 26, each processor carries out a transformation inthe Y direction on a data portion stored in a memory provided for theprocessor in accordance with Eq. (2).

[0061] <Permutation Process>

[0062] In the next processing 27, the data is divided into as many dataportions each located on a plane perpendicular to the Y axis as theprocessors as indicated by reference numeral 6 in FIG. 3, and the dataportions are each stored in a memory provided for one of the processorsin a distributed-memory configuration. Also in this case, as a techniqueto divide data, it is possible to adopt one of a variety of methods. Inaccordance with the block division method, for example, processor pwhere 0≦p≦(P−1) is designated as a processor having(N_(z)×N_(y))/(P×N_(z)) pieces of data with a Y coordinate k_(y)satisfying the following relation:

(N_(y)/P)×p≦k_(y)≦(N_(y)/P)×(p+1)−1

[0063] In order to implement the modified data division method describedabove, processor p transfers some of its data, which exists in processorp after the transformation carried out in the Y direction, to processorp′. To be more specific, processor p provides processor p′ with(N_(x)×N_(y))/(P×N_(z)) pieces of data with a Y coordinate k_(y)satisfying the following relation:

(N_(y)/P)×p′≦k_(y)≦(N_(y)/P)×(p′+1)−1

[0064] The process to transfer such data is referred to as thepermutation process.

[0065] <Transformation in the X Direction>

[0066] In the next processing 28, each processor carries out atransformation in the X direction on a data portion stored in a memoryprovided for the processor in accordance with Eq. (3).

[0067] <Transformation in the Z Direction>

[0068] In the next processing 29, each processor carries out atransformation in the Z direction on a data portion stored in a memoryprovided for the processor in accordance with Eq. (4). In the nextprocessing 30, data obtained as results of the transformation issupplied to the output apparatus.

[0069] The conventional 3-dimensional fast Fourier transformation basedon the permutation algorithm has been explained above. In accordancewith this method of computation, however, each processor needs totransfer data to all other processors during the permutation process 27in the course of the whole process as described in a paragraph with atitle of ‘Background of the Invention,’ and such transfers of databecome a bottleneck of the performance.

[0070] (3) 3-Dimensional Parallel-Processing Fast Fourier TransformationBased on a Method According to the Invention

[0071] In order to solve the problem cited above, in accordance with thepresent invention, the Y-direction and X-direction transformations basedon the permutation algorithm are each split into two partial processesand the processing order is changed so as to allow communications andcalculative processing to overlap each other. FIG. 1 shows a flowchartrepresenting a computation process carried out in accordance with thepresent invention. In comparison with the conventional computationprocess based on the permutation algorithm as shown in FIG. 7, thecomputation process according to the present invention includes piecesof processing, which are identical with pieces of processing included inthe flowchart shown in FIG. 7. The identical pieces of processing arethe pieces of processing ending with processing 25 and the pieces ofprocessing starting with processing 29. For this reason, the flowchartshown in FIG. 1 represents only pieces of processing between processing25 and processing 29. In the computation process according to thepresent invention, there are five pieces of processing betweenprocessing 25 and processing 29. These pieces of processing aredescribed as follows:

[0072] <Y-Direction Transformation Process (1)>

[0073] In processing 33 of the flowchart shown in FIG. 1, a Y-directiontransformation according to Eq. (2) is carried out only on some of adata portion, which was obtained as a result of data division and storedin processing 25 of the flowchart shown in FIG. 7 in a memory providedfor each of the processors in the distributed-memory configuration. Tobe more specific, the Y-direction transformation is carried out only ondata elements each having an even X coordinate j_(x). As describedbefore, data to be subjected to transformation processes is arrangedinto a complex-data series forming a rectangular solid, and thecomplex-data series is then divided into as many data portions eachlocated on a plane perpendicular to the Z axis as processors. Thus,since the complex-data series is not divided in the directions of the Xand Y axes, the series forms a distributed layout that allowstransformation processes in the X and Y directions to be completedwithout the need to transfer data between processors. In Y-directiontransformation process (1), however, each of the processors carries outa transformation process in the Y direction only on some of the dataportion, that is, only on data elements each having an even X coordinatejx.

[0074] <Y-Direction Transformation Process (2)>

[0075] In the next processing 34 of the flowchart shown in FIG. 1, eachof the processors carries out a transformation process in the Ydirection in accordance with Eq. (2) only on some of a data portionstored in its own memory. To be more specific, each processor carriesout the Y-direction transformation process only on data elements eachhaving an odd X coordinate j_(x). Concurrently with the transformationprocess, a permutation process is carried out only on data elements eachhaving an even X coordinate j_(x) in the same way as the permutationprocess for the conventional method. In the permutation process, some ofdata owned each processor p is transferred from processor p to processorp′. To be more specific, processor p provides processor p′ with(N_(x)/2)×(N_(y)/P)×N_(z) data elements with a Y coordinate k_(y)satisfying the following relation:

(N_(y)/P)×p′≦k_(y)≦(N_(y)/P)×(p′+1)−1

[0076] As a result of this permutation process, only data elements eachhaving an even X coordinate j_(x) are exchanged among the memories ofthe processors to be relocated in the memories.

[0077] Thus, in the resulting a state, each of the processors is capableof completing the transformation in the Z direction without transferringdata between processors. As for the X direction, each of the processorsis capable of executing all the steps of the transformation processexcept the last step, that is, each processor is capable of executingthe first log₂(N_(x)−1) steps of the transformation process carried outin the X direction, as processing performed on mutually adjacent dataelements.

[0078] <X-Direction Transformation Process (1)>

[0079] In the next processing 35 of the flowchart shown in FIG. 1, eachof the processors executes all steps of a transformation process carriedout in the X direction except the last step in accordance with Eq. (5)given below as a substitute for Eq. (3) only on some of a data portionstored in its own memory.

c _(kx, ky, jz) ^((2′))=Σ_(jx′=0) ^(Nx/2-1) c _(2jx′, ky, jz) ⁽¹⁾exp(−2πik _(x)2j _(x) ′/N _(x))   (5)

[0080] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1). To be more specific, each processor executes the firstlog₂(N_(x)−1) steps of the transformation process carried out in the Xdirection only on data elements each having an even X coordinate j_(x).Concurrently with the transformation process, a permutation process iscarried out only on data elements each having an odd X coordinate j_(x)in the same way as the permutation process for the conventional method.In the permutation process, some of data owned by each processor p istransferred from processor p to processor p′. To be more specific,processor p provides processor p′ with N_(x)/2×Ny/P×Nz data elementswith a Y coordinate k_(y) satisfying the following relation:

(N_(y)/P)×p′≦k_(y)≦(N_(y)/P)×(p′+1)−1

[0081] Thus, the relocation of data results in a state in which, fordata elements each having an odd X coordinate jx, each of the processorsis capable of completing a transformation process in the Z directionand, as for the X direction, each of the processors is capable ofexecuting all the steps of the transformation process except the laststep, that is, each processor is capable of executing the firstlog₂(N_(x)−1) steps of the transformation process carried out in the Xdirection, as processing performed on mutually adjacent data elements.

[0082] <X-Direction Transformation Process (2)>

[0083] In the next processing 36 of the flowchart shown in FIG. 1, eachof the processors executes all steps of a transformation process in theX direction except the last step in accordance with Eq. (6) given belowas a substitute for Eq. (3) only on some of a data portion stored in itsown memory.

c _(kx, ky, jz) ^((2″))=Σ_(jx′=0) ^(Nx/2-1) c _(2jx′+1, ky, jz) ⁽¹⁾exp(−2πik _(x)(2j _(x)′+1)/N _(x))   (6)

[0084] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N−1), symbol k_(y) denotes every integer in the range 0, 1, . . . ,(N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, . .. , (N_(z)−1). To be more specific, each processor executes the firstlog₂(N_(x)−1) steps of the transformation process carried out in the Xdirection only on data elements each having an odd X coordinate j_(x).

[0085] <X-Direction Transformation Process (3)>

[0086] In the next processing 37 of the flowchart shown in FIG. 1, eachof the processors executes the last step of the transformation processcarried out in the X direction in accordance with Eq. (7) given below ona data portion stored in its own memory. Eq. (7) is derived by usingC_(kx, ky, jz) ^((2′)) found by using Eq. (5) and C_(kx, ky, jz) ^((2″))found by using Eq. (6) as follows:

c _(kx, ky, jx) ⁽²⁾ =c _(kx, ky, jz) ^((2′)) +c _(kx, ky, jz) ^((2″))  (7)

[0087] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N−1), symbol k_(y) denotes every integer in the range 0, 1, . . . ,(N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, . .. , (N_(z)−1).

[0088] In Y-direction transformation process (2) according to the methodof computation described above, each processor is capable of carryingout the transformation process in the Y direction only on data elementseach having an odd X coordinate and a permutation process only on dataelements each having an even X coordinate at the same time. InX-direction transformation process (1) according to the method ofcomputation described above, on the other hand, each processor iscapable of carrying out the transformation process in the X directiononly on data elements each having an even X coordinate and a permutationprocess only on data elements each having an odd X coordinate at thesame time. Thus, some or all of the transfer time can be concealedbehind the processing time so that the efficiency of the computation canbe increased.

[0089] (4) Parallel-Processing High-Speed Fourier Transformation Library

[0090] The following description explains a typical case in which thepresent invention is applied to a library implementing the 3-dimensionalfast Fourier transformation by using a parallel-processing computer. Thename of the library is FFT3D, and the library is executed on allspecified processors running concurrently by the following callstatement:

CALL FFT3D (N_(x), N_(y), N_(z), P, F, TB, IOPT, IER)

[0091] where symbol N_(x) denotes the number of pieces of 3-dimensionaldata to be subjected to a Fourier transformation carried out in the Xdirection, symbol N_(y) denotes the number of pieces of 3-dimensionaldata to be subjected to a Fourier transformation carried out in the Ydirection, symbol N_(z) denotes the number of pieces of 3-dimensionaldata to be subjected to a Fourier transformation carried out in the Zdirection, symbol P denotes the number of processors for carrying outthe Fourier transformations, symbol F denotes an array for storing3-dimensional data {f_(jx, jy, jz)} to be subjected to Fouriertransformations at an input time or Fourier-transformation results{C_(kx, ky, kz)} at an output time, symbol TB denotes a table forstoring computed values of a complex exponential function used in theFourier transformations, symbol IOPT denotes an input argumentspecifying a function of the library as a function to be carried out inthe execution of the library and symbol IER denotes an output indicatingwhether or not a run-time error has been generated in the execution ofthe library. The array F represents a partial array owned by eachprocessor. Since the block division technique is applied to the inputdata with respect to the Z coordinate, processor p, where 0≦p≦(P−1),serving as the pth processor has only N_(x)×N_(y)×N_(z)/P pieces of datawith a Z coordinate j_(z) satisfying the relation(N_(z)/P)×p≦j_(z)≦(N_(z)/P)×(p+1)−1. That is to say, the array Fassigned to the pth processor is used for storing data as expressed bythe following equation:

F(j_(x), j_(y), j_(z)′)=f_(jx, jy, (Nz/P)*p+jz′)  (8)

[0092] where symbol j_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol j_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol j_(z)′ denotes every integer in the range 0, 1, .. . , (N_(z)/P−1). Thus, the size of every array F assigned to aprocessor is N_(x)×N_(y)×N_(z)/P. The table TB is a table for storingvalues computed in the first call as values of a complex exponentialfunction. These values of a complex exponential function can be reusedin the second and subsequent calls so that the computation of the valuesdoes not need to be repeated. In the first call, the argument IOPT isset at 1 to indicate that a table is to be created as the table TB forstoring computed values of a complex exponential function. In the secondand subsequent calls, on the other hand, the argument IOPT is set at 2to indicate that the computed values already stored in the table TB areto be used.

[0093]FIG. 8 shows a flowchart representing the operation of thelibrary. When the library is called in processing 39, the libraryexamines the validity of input arguments in the next processing 40. Toput it concretely, for example, the library examines the argumentsN_(x), N_(y), N_(z) and P to confirm that they are each an integer, andexamines the argument IOPT to confirm that its value is 1 or 2. If anyof the input arguments has an invalid value, the flow of the operationgoes on to processing 41 in which IER is set at 1,000. Then, control isreturned to the calling program. If none of the input arguments has aninvalid value, on the other hand, the flow of the operation goes on toprocessing 42 in which other processors are informed of this call. Then,the flow of the operation goes on to processing 43 determine whether ornot the library has been called for P processors as specified by theargument P. If this condition is not satisfied, the flow of theoperation goes on to processing 44 in which IER is set at 2,000. Then,control is returned to the calling program. If this condition issatisfied, on the other hand, the flow of the operation goes on toprocessing 45 in which the value of the argument IOPT is examined. Ifthe value of the argument IOPT is 1, the flow of the operation goes onto processing 46 in which values of a complex exponential function usedin the Fourier transformations in all directions are computed and storedin the table TB. Then, the flow of the operation goes on to processing47.

[0094] If the value of the argument IOPT is 2, on the other hand, theflow of the operation goes on directly to the processing 47 in whichtransformation processes in the Y and X directions are carried out.These transformation processes are carried out in accordance with amethod described in a paragraph included in the description of thisembodiment as a paragraph with a title of ‘(3) 3-DimensionalParallel-Processing Fast Fourier Transformation Based on a MethodProvided by the Invention.’ To put it concretely, these transformationprocesses are completed by sequentially carrying out five processesranging from Y-direction transformation process (1) performed in theprocessing 33 of the flowchart shown in FIG. 1 to X-directiontransformation process (3) performed in the processing 37 of theflowchart shown in FIG. 1. Then, in the next processing 48 of theflowchart shown in FIG. 8, a transformation process in the Z directionis carried out in accordance with a method described in a paragraphincluded in the description of this embodiment as a paragraph with atitle of ‘(2) 3-Dimensional Parallel-Processing Fast FourierTransformation Based on the Conventional Method.’ At the end of thistransformation process carried out in the Z direction, the 3-dimensionalfast Fourier transformation processing is all completed. At thecompletion time, pieces of data obtained as a result of a block-divisionprocess carried out with respect to the Y coordinate are stored in thearray F in the same way as the 3-dimensional parallel-processing fastFourier transformation based on the conventional method. To put itconcretely, in the next processing 49 of the flowchart shown in FIG. 8,the following pieces of data are stored in the array F assigned to thepth processor:

F(k_(x), k_(y)′, k_(z))=c_(kx, (Ny/P)*p+ky′, kz)   (9)

[0095] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y)′ denotes every integer in the range 0, 1, . .. , (N_(y)/P−1) and symbol k_(z) denotes every integer in the range 0,1, . . . , (N_(z)−1).

[0096] (5) Simulation Program

[0097]FIG. 9 shows a flowchart representing a parallel-processingprogram developed for a weather calculation to be carried out in thisembodiment. The following description explains a typical application ofthe program to a computation for 3-dimensional mesh with a size ofN_(x)×N_(y)×N_(z).

[0098] The flowchart representing the parallel-processing program beginswith processing 51 in which the program inputs parameters, which includethe sizes N_(x), N_(y) and N_(z) of a computation area, initialconditions and material constants. The initial conditions include atemperature, a velocity of the wind and a pressure, whereas the materialconstants include mainly the thermal conductivity of the air. Inaddition, the program carries out preprocessing required for thecomputation. In this preprocessing, interpolations are carried out onthe temperature, the velocity of the wind and the pressure, which havebeen obtained as a result of observation, in order to obtain data atmesh points as data required for the computation. Then, in the nextprocessing 52, data including the temperature, the velocity of the windand the pressure is distributed to processors employed in theparallel-processing computer. The distributed data is results of ablock-division process carried out on input data in the Z direction.Thus, the distributed data allows the use of the 3-dimensional fastFourier transformation library FFT3D described in the precedingparagraph.

[0099] After the pieces of processing 51 and 52 are finished, the flowof the program enters a loop to find quantities such as the temperature,the velocity of the wind and the pressure for a variety of time stepsone after another. Equations serving as a fundamental are the followingthree equations:

[0100] An equation of motion for the velocity of the wind:

du/dt=−2Ω×u−(1/ρ)∇_(p) +F _(u)   (10)

[0101] An equation for conservation of mass:

dρ/dt=−ρ∇·u   (11)

and

[0102] An equation expressing a change in temperature:

dT/dt=−κ∇ ² T+u·∇T   (12)

[0103] where symbol u denotes the velocity of the wind, symbol p denotesthe pressure, symbol T denotes the temperature, symbol Ω denotesCorioli's force, which is a force caused by the autorotation of theearth, symbol F_(u) denotes other forces, symbol ρ denotes the densityof the air and symbol κ denotes the thermal conductivity of the air. Inorder to find values of data for the next point of time from the aboveequations, first of all, in processing 53, the wind velocity u, thepressure p and the temperature T at a lattice point are transformed intopieces of data in a wave-number space by adopting the Fouriertransformation. Then, in the next processing 54, these pieces of data inthe wave-number space are differentiated. Subsequently, in the nextprocessing 55, the pieces of data in the frequency space are reverselysubjected to an inverse Fourier transformation process to find atemperature gradient ∇T, a second derivative ∇²T, a pressure gradient ∇pand a velocity divergence ∇·u. Then, in the next processing 56, thesequantities are substituted into expressions on the right side of Eqs.(8) to (10) to find the wind velocity u, the pressure p and thetemperature T for the next time step. The wind velocity u, the pressurep and the temperature T at a lattice point are transformed into piecesof data in a wave-number space by adopting the Fourier transformation asdescribed above because, in this way, the derivatives can be found witha high degree of precision. In this program, the 3-dimensional Fouriertransformation library FFT3D is applied to this part of the calculation.

[0104] In the next processing 57 at the end of the loop, the status ofthe computation is examined to determine whether or not the computationto find the quantities for all time steps ending at the supposed lastpoint of time has been completed. If the computation has been completed,the flow of the program goes on to processing 58 in whichpost-processing is carried out. Then, in the next processing 59, resultsof the computation are output. In the post-processing, results areinterpolated to find values for specific points, at which data isrequired, in a case mesh points of the computation are different fromthe specific points.

[0105] The above application has been explained by taking a calculationto forecast the whether as an example. It is obvious, however, that thetechnique provided by the present invention can also be adopted forother applications. For example, the technique can be applied to asimulation adopting the 3-dimensional fast Fourier transformation in aparallel-processing computer. In addition, in the application describedabove, prior to the Fourier transformation process, 3-dimensional datais subjected to a block-division process carried out in the Z directionbut, after the Fourier transformation process, data is subjected to ablock-division process carried out in the Y direction. It is obvious,however, that the technique provided by the present invention can alsobe adopted for applications in which, in place of the block-divisionprocesses, cyclic-division processes or block cyclic division processesare carried out. Furthermore, in the application described above, thetransformation processes in the Y, X and Z directions are carried out inthe Y→X→Z order. However, Y, X and Z are no more than names assigned tothe axes of coordinates for the sake of convenience. Thus, it is obviousthat the technique provided by the present invention can also be adoptedin the same way for an application method in which, for example, thename Y is replaced by the name X (X→Y), the name Z is replaced by thename Y (Y→Z) and the name X is replaced by the name Z (Z→X).

[0106] <<Second Embodiment>>

[0107] As described above, in the first embodiment, a method ofcomputation is adopted for the 3-dimensional Fourier transformation.However, this method of computation can also be applied to a1-dimensional Fourier transformation. As is widely known, in order toapply the 1-dimensional Fourier transformation to data of N points, thedata is arranged to form a rectangular solid with a size ofN_(x)×N_(y)×N_(z) where symbols N_(x), N_(y) and N_(z) each denote anyarbitrary integer and satisfy the relation N_(x)×N_(y)×N_(z)=N. Then, a3-dimensional Fourier transmission adding a process referred to as a‘twist-coefficient multiplication’ is carried out on the arranged data.The twist-coefficient multiplication is a process in which, after theY-direction transformation according to Eq. (2), processing is carriedout on the intermediate result C_(jx, ky, jz) ⁽¹⁾ in accordance with Eq.(13) as follows:

[0108] <Y-direction transformation according to Eq. (2)>

c _(jx, ky, jz) ⁽¹⁾ :=c _(jx, ky, jz) ⁽¹⁾ ×exp(2πik _(y) j _(x)/(N _(x)N _(y))   (13)

[0109] where symbol j_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1). In addition, after the X-direction transformationaccording to Eq. (3), processing is carried out on the intermediateresult C_(kx, ky, jz) ⁽²⁾ in accordance with the Eq. (14) as follows:

[0110] <X-direction transformation according to Eq. (3)>

c _(kx, ky, jz) ⁽²⁾ =c _(kx, ky, jz) ⁽²⁾ ×exp(2πi(N _(y) k _(x) +k_(y))j _(x)/(N _(x) N _(y) N _(z))   (14)

[0111] where symbol k_(x) denotes every integer in the range 0, 1, . . ., (N_(x)−1), symbol k_(y) denotes every integer in the range 0, 1, . . ., (N_(y)−1) and symbol j_(z) denotes every integer in the range 0, 1, .. . , (N_(z)−1). As described above, the twist-coefficientmultiplication comprises independent pieces of processing carried out ondata elements of the arrays C_(jx, ky, jz) ⁽¹⁾ and C_(kx, ky, jz) ⁽²⁾.Thus, these pieces of processing can be incorporated respectively afterthe Y-direction transformation and the X-direction transformation in themethod provided by the first embodiment. In this way, the equationsprovided by the present invention can also be applied to the1-dimensional fast Fourier transformation.

[0112] <<Third Embodiment>>

[0113] The present invention can be further applied to a 2-dimensionalfast Fourier transformation. The present invention is applied to the2-dimensional fast Fourier transformation in the same way as the firstembodiment except N_(z) is set at 1.

[0114] In addition, in the case of the application to the 2-dimensionalfast Fourier transformation, data serving as an object of thetransformation is a complex-data array arranged 2-dimensionally in thedirections of the X and Y axes. At a stage of a data-division processprior to ‘Y-Direction Transformation Process (1)’ explained in thedescription of the first embodiment, the complex-data array is dividedinto as many data portions each laid out on a plane perpendicular to theX axis as processors, and each of the data portions is stored in amemory provided for one of the processors in a distributed-memoryconfiguration. Thus, the complex-data array is put in a state allowing atransformation in the Y direction to be completed without transferringdata from one processor to another. In ‘Y-Direction TransformationProcess (1)’, however, each of the processor carries out only atransformation in the Y direction on data elements each having an even Xcoordinate j_(x). Then, in Y-Direction Transformation Process (2), eachof the processor carries out a transformation in the Y direction only ondata elements each having an odd X coordinate j_(x), and concurrentlywith this Y-direction transformation, data elements each having an evenX coordinate j_(x) are relocated by being transferred among theprocessors in another rearrangement process. In this other rearrangementprocess, the complex-data array is divided into as many data portionseach laid out along a straight line perpendicular to the X axis as theprocessors, and each of the data portions is stored in a memory providedfor one of the processors in a distributed-memory configuration. As aresult, each of the processors is capable of executing all steps of thetransformation process carried out in the X direction except the laststep, that is, each processor is capable of executing the firstlog₂(N_(x)−1) steps of the X-direction transformation. Furthermore, inX-Direction Transformation Process (1), each of the processors executesthe first log₂(N_(x)−1) steps of the transformation carried out in the Xdirection on the relocated data elements each having an even Xcoordinate j_(x), and data elements each having an odd X coordinatej_(x) are relocated by being transferred among the processorsconcurrently with the execution of these first steps of the X-directiontransformation. The data elements each having an odd X coordinate j_(x)are rearranged for relocation in the same way as the other rearrangementprocess described above. Subsequently, the transformation processing iscontinued to X-Direction Transformation Process (2) and, then, toX-Direction Transformation Process (3) in exactly the same way as thefirst embodiment. Since the data serving as an object of transformationis 2-dimensional data, however, the transformation processing is endedwith X-Direction Transformation Process (3).

[0115] <<Fourth Embodiment>>

[0116] The following description explains a computation of a structureof electrons in a semiconductor device or the like as anotherapplication carrying out a simulation by adoption of the fast Fouriertransformation according to the present invention. In the computation ofa structure of electrons, a wave function u(r) of electrons defined in a3-dimensional mesh is computed in accordance with the followingSchroedinger equation:

du(r)/dt=−(h ²/2m)∇² u(r)+(E−V(r))u(r)   (15)

[0117] where symbol h denotes Planck's constant, symbol m denotes themass of an electron, symbol E denotes the energy level of the computedwave function and symbol V denotes a potential energy of other electronsand atoms in the crystal. The wave function is computed in order to findquantities such as the size of a band gap determining the properties ofthe semiconductor and the stability of the structure of a crystal.

[0118] In the computation of the expression on the right side of Eq.(15), a second derivative ∇²u(r) of the wave function is required. Forthe same reason as the one explained in the description of theweather-forecasting application, however, this portion is computed afteru(r) is moved to the wave-number space by applying the Fouriertransformation, and the result of the computation is returned back tothe real space. Thus, when a structure of electrons is found by using aparallel-processing computer, the 3-dimensional fast Fouriertransformation according to the present invention can be applied to thisportion.

[0119] As described above, in accordance with the present invention, inthe fast Fourier transformation implemented in a parallel-processingcomputer having a distributed-memory configuration, a data manipulationprocess and a data transfer process are carried out concurrently at thesame time so that some or all the time it takes to carry out the latterprocess can be concealed behind the time required for performing theformer process. Thus, by adoption of the method provided by the presentinvention, the efficiency of the processing parallelism can be increasedover the processing efficiency of the conventional method. It is to benoted that, while the effect of the efficiency improvement much dependson the inter-processor communication performance displayed by theparallel-processing computer having a distributed-memory configuration,reduction of the execution time by about 20% to 30% can be expected fora case of solving a problem with a typical N value of 256³ by using 16processors for example.

1. A fast Fourier transformation method of implementing a fast Fouriertransformation in a parallel-processing computer comprising an inputapparatus, a processing apparatus, which includes a plurality ofprocessors each having one of memories and a network for transferringdata among said memories employed for said processors, an outputapparatus and an external storage apparatus, said fast Fouriertransformation method comprising the steps of: dividing data serving asan object of said Fourier transformation into a plurality of dataportions and distributing said data portions to said processors in orderto store said data portions in said memories employed for saidprocessors; further dividing each of said data portions distributed tosaid processors into a first data part and a second data part anddriving every individual one of said processors to carry out a processon said first data part of said data portion distributed to saidindividual processor; and driving every individual one of saidprocessors to carry out a process on said second data part of said dataportion distributed to said individual processor and, at the same time,concurrently carry out processing to relocate results of said processcarried out on said first data part by reassigning said results amongsaid processors.
 2. A fast Fourier transformation method according toclaim 1, wherein said process carried out on said first data part andsaid process carried out on said second data part are each a process ofa Fourier transformation performed on said data parts in the directionof a first axis; and wherein said method further comprises the step of,subsequently to the step of relocating results of said process carriedout on said first data part, carrying out a process of said Fouriertransformation performed in the direction of a second axis on said datarelocated among said processors.
 3. A fast Fourier transformation methodaccording to claim 2 wherein: said first data part of each of said dataportions is pieces of data included in a data array serving as an objectof said Fourier transformation as even-numbered pieces of data each setin said direction of the second axis; and said second data part of eachof said data portions is pieces of data included in said data arrayserving as an object of said Fourier transformation as odd-numberedpieces of data set in said direction of the second axis.
 4. A fastFourier transformation method according to claim 3 wherein, concurrentlywith said step of carrying out said Fourier-transformation performed insaid direction of said second axis on said relocated data, everyindividual one of said processors relocates results of said Fouriertransformation carried out in said direction of said first axis on saidsecond data part of said data portion pertaining to said individualprocessor by reassigning said results to the other ones of saidprocessors.
 5. A fast Fourier transformation method according to claim4, further comprising the steps of: carrying out said Fouriertransformation in said direction of said second axis on data of saidrelocated second data part of each of said data portions upon completionof said step of relocating said results of said Fourier transformationcarried out in said direction of said first axis on said second datapart of each of said data portions; and carrying out an end process ofsaid Fourier transformation performed in said direction of said secondaxis on said data serving as an object of said Fourier transformation byusing both results of said Fourier transformation performed in saiddirection of said second axis on said second data part of each of saiddata portions and results obtained earlier as results of said Fouriertransformation performed in said direction of said second axis on saidfirst data part of each of said data portions.
 6. A 3-dimensional fastFourier transformation method of implementing a 3-dimensional fastFourier transformation in a parallel-processing computer comprising aninput apparatus, a processing apparatus, which includes a plurality ofprocessors each having one of memories and a network for transferringdata among said memories employed for said processors, an outputapparatus and an external storage apparatus, said 3-dimensional fastFourier transformation method comprising the steps of: dividing datainto data portions each laid out on one of planes, which are orientedperpendicularly to the direction of a Z axis and arranged to form arectangular solid with dimensions of NX, NY and NZ, where NX and NY areside lengths of said rectangular solid in the directions of X and Y axesrespectively whereas NZ is a side length of said rectangular solid insaid direction of said Z axis, and distributing said data portions tosaid processors in order to store said data portions in said memoriesemployed for said processors; driving every individual one of saidprocessors to carry out a transformation process in said direction ofsaid Y axis only on data elements each having an even X coordinate;driving every individual one of said processors to carry out anothertransformation process in said direction of said Y axis only on dataelements each having an odd X coordinate and, concurrently with saidanother transformation process, carry out a data transfer process onsaid data elements each having an even X coordinate so as to distributedata parts, which are each laid out on a plane perpendicular to saiddirection of said Y axis as a result of another division of said data,to said processors other than said individual processor in order tostore said data parts in said memories employed for said otherprocessors; driving every individual one of said processors to executethe first log₂(N_(X)−1) steps of a transformation process carried out insaid direction of said X axis only on data elements each having an evenX coordinate and, concurrently with said transformation process carriedout in said direction of said X axis, carry out a data transfer processon said data elements each having an odd X coordinate so as todistribute said data parts, which are each laid out on a planeperpendicular to said direction of said Y axis as a result of said otherdivision of said data, to said processors other than said individualprocessor in order to store said data parts in said memories employedfor said other processors; driving every individual one of saidprocessors to execute said first log₂(N_(X)−1) steps of saidtransformation process carried out in said direction of said X axis onlyon data elements each having an odd X coordinate; driving each of saidprocessors to execute the last step of said transformation processcarried out in said direction of said X axis; and driving each of saidprocessors to carry out a transformation process in said direction ofsaid Z axis.
 7. A 1-dimensional fast Fourier transformation method ofapplying a 1-dimensional fast Fourier transformation to data at N pointsin a parallel-processing computer comprising an input apparatus, aprocessing apparatus, which includes a plurality of processors eachhaving one of memories and a network for transferring data among saidmemories employed for said processors, an output apparatus and anexternal storage apparatus, said 1-dimensional fast Fouriertransformation method comprising the steps of: dividing data serving asan object of said Fourier transformation into data portions each laidout on one of planes, which are oriented perpendicularly to thedirection of a Z axis and arranged to form a rectangular solid withdimensions of NX, NY and NZ, where NX and NY are side lengths of saidrectangular solid in the directions of X and Y axes respectively whereasNZ is a side length of said rectangular solid in said direction of saidZ axis, and distributing said data portions to said processors in orderto store said data portions in said memories employed for saidprocessors; driving every individual one of said processors to carry outa transformation process in said direction of said Y axis only on dataelements each having an even X coordinate, and a multiplication processon twist coefficients; driving every individual one of said processorsto carry out another transformation process in said direction of said Yaxis only on data elements each having an odd X coordinate as well as amultiplication process on twist coefficients and, concurrently with saidanother transformation process and said multiplication process, carryout a data transfer process on said data elements each having an even Xcoordinate so as to distribute data parts, which are each laid out on aplane perpendicular to said direction of said Y axis as a result ofanother division of said data serving as an object of said Fouriertransformation, to said processors other than said individual processorin order to store said data parts in said memories employed for saidother processors; driving every individual one of said processors toexecute the first log₂(N_(X)−1) steps of a transformation processcarried out in said direction of said X axis only on data elements eachhaving an even X coordinate and, concurrently with said transformationprocess carried out in said direction of said X axis, carry out a datatransfer process on said data elements each having an odd X coordinateso as to distribute said data parts, which are each laid out on a planeperpendicular to said direction of said Y axis as a result of said otherdivision of said data serving as an object of said Fouriertransformation, to said processors other than said individual processorin order to store said data parts in said memories employed for saidother processors; driving every individual one of said processors toexecute said first log₂(N_(X)−1) steps of said transformation processcarried out in said direction of said X axis only on data elements eachhave an odd X coordinate; driving each of said processors to execute thelast step of said transformation process carried out in said directionof said X axis and a multiplication process on twist coefficients; anddriving each of said processors to carry out a transformation process insaid direction of said Z axis.
 8. A fast Fourier transformation methodof implementing a fast Fourier transformation in a parallel-processingcomputer comprising an input apparatus, a processing apparatus, whichincludes a plurality of processors each having one of memories and anetwork for transferring data among said memories employed for saidprocessors, an output apparatus and an external storage apparatus,wherein: a monitor for observing status of said processors employed insaid parallel-processing computer and status of communications amongsaid processors shows that processing carried out by said processorscomprise the following four phases: a first phase of driving anyindividual one of said processors to carry out calculative operationsonly; a second phase of driving any individual one of said processors tocarry out calculative operations as well as operations to transfer datato said processors other than said individual processor at the sametime; a third phase of driving any individual one of said processors tocarry out calculative operations as well as operations to transfer datato said processors other than said individual processor at the sametime; and a fourth phase of driving any individual one of saidprocessors to carry out calculative operations only.
 9. Aweather-forecast computation system for computing a weather forecast byapplication of a fast Fourier transformation implemented by using aparallel-processing computer, said weather-forecast computation systemadopting a fast Fourier transformation method implementing a fastFourier transformation in a parallel-processing computer comprising aninput apparatus, a processing apparatus, which includes a plurality ofprocessors each having one of memories and a network for transferringdata among said memories employed for said processors, an outputapparatus and an external storage apparatus, said fast Fouriertransformation method comprising the steps of: dividing data serving asan object of said Fourier transformation into a plurality of dataportions and distributing said data portions to said processors in orderto store said data portions in said memories employed for saidprocessors; further dividing each of said data portions distributed tosaid processors into a first data part and a second data part anddriving every individual one of said processors to carry out a processon said first data part of said data portion distributed to saidindividual processor; and driving every individual one of saidprocessors to carry out a process on said second data part of said dataportion distributed to said individual processor and, at the same time,concurrently carry out processing to relocate results of said processcarried out on said first data part by reassigning said results amongsaid processors.
 10. A electron-structure computation system forcomputing an electron structure by application of a fast Fouriertransformation implemented by using a parallel-processing computer, saidelectron-structure computation system adopting a fast Fouriertransformation method implementing a fast Fourier transformation in aparallel-processing computer comprising an input apparatus, a processingapparatus, which includes a plurality of processors each having one ofmemories and a network for transferring data among said memoriesemployed for said processors, an output apparatus and an externalstorage apparatus, said fast Fourier transformation method comprisingthe steps of: dividing data serving as an object of said Fouriertransformation into a plurality of data portions and distributing saiddata portions to said processors in order to store said data portions insaid memories employed for said processors; further dividing each ofsaid data portions distributed to said processors into a first data partand a second data part and driving every individual one of saidprocessors to carry out a process on said first data part of said dataportion distributed to said individual processor; and driving everyindividual one of said processors to carry out a process on said seconddata part of said data portion distributed to said individual processorand, at the same time, concurrently carry out processing to relocateresults of said process carried out on said first data part byreassigning said results among said processors.